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We  investigated  the  presence  of  cognitive  biases  in  judgment  and  techniques  to  obtain  valid  probability  assessments  using  five 
psychological  studies  in  a  laboratory  setting.  Studies  in  this  project  involve  participants  from  the  general  pool  and  those  with 
specialized  military  knowledge.  While  the  former  type  of  participant  is  available  from  UGA’s  Psychology  Research  Pool,  the 
latter  is  available  from  the  Army  and  Air  Force  ROTC  situated  within  UGA. 

Studies  1-3  took  place  in  the  context  of  the  Georgia  test  bed  for  autonomous  control  of  vehicles  (GaTAC).  GaTAC  is  a  computer 
simulation  framework  for  evaluating  autonomous  control  of  aerial  robotic  vehicles  such  as  UAVs  (see  Fig.  1  in  attached 
Powerpoint  file).  It  provides  a  low-cost  and  open-source  laboratory  alternative  to  highly  complex  and  expensive  simulation 
infrastructures.  GaTAC  deploys  multiple  instances  of  the  open-source  flight  simulator,  FlightGear,  utilizing  hyper-realistic  3D 
terrain  data  on  a  networked  cluster  of  computing  platforms.  GaTAC  provides  a  flight  dynamics  software  module  that  translates 
high-level  navigational  actions  to  commands  for  the  flight  control  surfaces.  GaTAC  will  allow  for  maximum  ecological  validity  of 
the  studies  in  a  laboratory  setting.  Previous  investigations  into  biases  in  judgment  utilized  simple  settings  in  which  the  relevant 
probabilities  were  few,  were  given  and  the  calculations  were  often  simple.  GaTAC  is  developed  in  the  THINC  lab  in  the 
department  of  computer  science  at  UGA. 

Study  1  investigated  whether  participants'  verbal  probability  assessments  are  unreliable  and  sought  ways  to  validate  the 
assessments.  The  primary  alternative  to  direct  verbal  expressions  is  to  determine  preferences  between  betting  on  the  outcome 
of  the  predictive  probability  event  and  on  events  with  clear,  objectively  specified  probabilities.  The  study  will  investigate  whether 
the  verbal  reports  are  consistent  with  degrees  of  uncertainty  as  inferred  from  the  choice  data. 

Study  2  investigated  possible  lack  of  honesty  of  general  pool  participants  in  reporting  true  probability  assessments  and  sought 
ways  to  correct  it.  It  examined  whether  proper  scoring  rules  are  needed  to  obtain  honest  and  reliable  reports  of  subjective 
probabilities. 

Study  3  investigated  possible  lack  of  honesty  of  ROTC  pool  participants  in  reporting  true  probability  assessments  and  sought 
ways  to  correct  it.  It  examined  whether  proper  scoring  rules  are  needed  to  obtain  honest  and  reliable  reports  of  subjective 
probabilities. 

Study  4  examined  the  effect  of  proper  scoring  rules  on  the  general  pool  participants  in  the  context  of  training  interventions.  The 
protocol  did  not  use  GaTAC  and  involved  showing  pre-defined  trajectories  to  the  participants  as  Powerpoint  slides. 

The  final  Study  5  analyzed  the  effect  of  two  independent  variables,  use  of  a  proper  scoring  rule  such  as  the  Brier  scoring  rule 
and  training  intervention,  on  probability  judgments  in  our  strategic  game  setting.  The  protocol  did  not  use  GaTAC  and  involved 
showing  pre-defined  trajectories  to  the  participants  as  Powerpoint  slides.  Participant  briefing  was  improved  to  include  a 
demonstration  of  how  the  Brier  scoring  rule  operated  for  improved  understandability. 


Results 


We  have  completed  running  and  analyzing  all  five  studies  during  the  grant  period.  We  recruited  a  total  of  467  participants  for 
the  studies,  out  of  which  28  were  from  the  ROTC  pool  while  the  remaining  were  from  the  general  Psychology  research  pool.  In 
Study  1  we  did  not  observe  any  systematic  inflationary  or  deflationary  bias  in  the  uncertainty  expressions  of  the  participants. 
However,  the  significantly  smaller  interval  for  ROTC  participants  despite  the  small  sample  indicates  a  much  better  behaved 
population  in  the  context  of  this  experiment,  as  compared  to  the  general  research  pool.  In  the  context  of  Study  2,  which  utilized 
predefined  trajectories  of  the  subject’s  UAV,  we  observed  that  monetary  incentivization  using  a  non-proper  scoring  rule,  0-1 
scoring,  resulted  in  a  significant  increase  in  payout  and  consequently  better  calibrated  probability  assessments.  However,  use 
of  a  proper  scoring  rule,  Brier  scoring,  did  not  show  similar  benefit.  We  did  not  observe  a  main  effect  of  any  of  the  scoring  rules 
for  the  ROTC  pool  in  Study  3.  Consequently,  we  did  not  replicate  the  positive  finding  of  a  main  effect  of  0-1  scoring  on 
probability  assessments  as  observed  in  Study  2.  We  did  not  observe  a  significant  effect  of  the  proper  scoring  rule  among  the 
participants  in  Study  4  despite  the  use  of  interventions.  The  results  of  Study  5  demonstrated  the  capacity  of  probability 
assessments  to  fall  under  the  control  of  incentives,  which  are  well  understood,  even  under  conditions  of  extreme  ambiguity.  We 
observed  a  statistically  significant  main  effect  of  the  proper  Brier  scoring  (see  Fig  2.  in  attached  Powerpoint  and  table  below). 
However,  the  main  effect  of  intervention  and  the  interaction  effect  between  intervention  and  Brier  scoring  are  not  significant. 


Main  ANOVA  results  of  the  study. 


df  Type  III  SS  Mean  Square 

F  value 

P 

partial 

Intervention  1 

0.01062 

0.01062 

0.61 

0.44 

.006 

Brier_Score  1 

0.13573 

0.13573 

7.83 

0.006** 

.065 

Intx  Brier  1 

0.00925 

0.00925 

0.53 

0.47 

.005 

Residual  1121.94153 

0.01734 

Computational  modeling  of  assessment  data 


An  analysis  of  trend  in  participants  judging  the  probabilities  of  successfully  reaching  the  target  in  Study  4  is  indicative  of  a 
learning  effect  across  decision  points  and  a  learning  effect  as  trials  progress  as  well.  Consequently,  we  are  interested  in 
computationally  modeling  this  learning  using  a  process-oriented,  generative  model  with  psychological  plausibility. 

Our  results  demonstrated  that  descriptive  reinforcement  learning  with  cognitive  biases  gets  us  close  to  modeling  human 
judgments  in  contexts  with  delayed  reinforcements  but  shows  room  for  further  improvement  (see  Fig.  3  in  attached  Powerpoint). 
Certain  behaviors  are  challenging  to  computationally  model,  such  as  participants  dropping  their  assessments  in  the  later 
stages.  This  observation  illuminates  a  pitfall  of  reinforcement  learning:  model  assessments  may  propagate  too  slowly  to 
precisely  match  the  data  set,  an  observation  that  has  precedence.  While  participants  may  quickly  change  their  assessments, 
temporal  difference  learning  requires  several  iterations  before  a  dramatic  change  is  visible. 

Technology  Transfer 

As  part  of  technology  transfer  outreach,  PI,  Prashant  Doshi,  gave  a  presentation  titled,  Evaluating  the  Validity  of  Probability 
Assessments  in  Strategic  and  Realistic  Environments,  at  the  US  Army  Conference  on  Applied  Statistics  (ACAS)  held  in 
Annapolis,  MD,  on  October  19,  2011.  Co-PI  Adam  Goodie  gave  a  presentation  titled,  The  Role  of  Incentive  Schemes  in 
Probability  Assessment  Under  Uncertainty,  at  the  US  Army  Conference  on  Applied  Statistics  (ACAS)  held  in  Monterrey,  CA,  on 
October  26,  2012. 

The  PI  provided  brief  technical  consult  to  John  Nierwinski  Jr.  from  the  US  Army  Materiel  Systems  Analysis  Activity  (AMSAA)  on 
the  topic  of  combining  information  from  multiple  subject  matter  experts. 
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-  Operators  of  vehicles  such  as  UAVs  may 
generally  assess  probabilities  of  their 
predictions  at  levels  not  objectively  justified 

-  Uncertainty  in  realistic  settings  is  difficult  to 
judge  and  not  objectively  quantified 

•  Existing  approaches  inapplicable 


Probability  judgments 


-  No  consensus  on  key  issues  related  to 
probability  judgment 

-  Domain-general  approaches  for  valid 
probability  assessments  may  not  apply  to 
complex  military  context 

•  Prior  studies  used  simplistic  settings  (not 
strategic) 
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Figure  1 :  Architecture 
of  the  Georgia  testbed 
for  control  of  autonomous 
vehicles  (GaTAC) 
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Figure  2.  Average  summed  Brier  score  per  trial  in  Study  5  with 
and  without  review  interventions.  We  also  compare  the  observed 
Brier  scores  with  those  that  result  from  baseline  assessments 
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Probability 


Trial  Trial 

Figure  3.  Average  probability  assessment  in  trials 
that  led  to  wins  (left)  and  losses  (right)  as  observed 
in  data  and  predicted  by  our  computational  models. 


